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Abstract 

The generation of random graphs using edge swaps provides a reliable method to draw uniformly 
random samples of sets of graphs respecting some simple constraints, e.g. degree distributions. 
However, in general, it is not necessarily possible to access all graphs obeying some given con- 
straints through a classical switching procedure calling on pairs of edges. We therefore propose to 
get round this issue by generalizing this classical approach through the use of higher-order edge 
switches. This method, which we denote by "fc-edge switching", makes it possible to progres- 
sively improve the covered portion of a set of constrained graphs, thereby providing an increasing, 
asymptotically certain confidence on the statistical representativeness of the obtained sample. 

Key words: graph algorithms, random graphs, edge switching, Markov-chain mixing, 
constrained graphs 



Introduction 

The generation of random graphs respecting some constraints has two direct applications: the 
modeling of realistic network topology when empirical data are missing, and the confirmation of 
the role of a given set of constraints in the presence of some empirically observed topological and 
structural features (i.e. some target observables, such as in e.g. [17]). There is however currently no 
general approach to directly create uniformly random graph samples given arbitrary constraints, 
except for some very specific cases usually related to degree distributions (in this paper, degree 
distribution refers to a specific sequence of degrees, as opposed to a probability distribution). 

Existing methods for generating random samples of a set of graphs Qc respecting a given set 
of constraints C fall indeed into two broad categories: 

• Either by directly building a graph of Qc from scratch, i.e. randomly assigning links to pairs 
of nodes such that the overall constraint is respected. For instance, the configuration model 
as presented by ^ provides random graphs by connecting half-links on nodes such that each 
resulting graph respects a given prescribed degree distribution. 

• Or by using an original graph which already belongs to Qc and iteratively reshuffling edges 
of this graph while altogether remaining in Qc in order to asymptotically converge, after 
a "sufflcient" number of iterations, to a uniformly random element of Qc- This approach 
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of switching pairs of edges has been proposed for instance by [19] who aim at obtaining a 
random graph with a given degree distribution by switching pairs of links in an initial graph 
which already respects this constraint. 

The asymptotical convergence is generally empirically appraised with respect to the target 
observables. Besides, approaches based on edge swaps implicitly assume that the number 
of nodes N, the number of edges M and the degree sequence are part of C. In this case, 
we consider that C is the union of two subsets: C = C'' U C+, where refers to the 
fundamental constraint forcing graphs to have M links, N nodes, a given degree sequence 
and to be of a certain type (simple graphs, multigraphs, etc.), while C"*" refers to some 
additional and arbitrary set of constraints, depending on the context. 

While the former method assuredly poses a new design challenge for every new kind of con- 
straint — each set of constraints basically requires a new configuration model — on the other 
hand, the latter approach raises the issue of obtaining uniformly random elements oi Qc- Put 
differently and as we will see below, this reshuffling approach, which initially requires at least one 
graph from Qc, does not guarantee in general that the final graph is uniformly chosen from the 
whole set Gc- 



We propose to both (i) appraise the potential issues and drawbacks of random graph creation 
based on pairwise ed ge s witc hing (Sec.[Tl), which is a relatively traditional method in the literature 



@, i, 0, [H, M H M,M, Slii 0, i, Q S, and, then, (ii) introduce a method for 

producing random, simulation-based samples of graphs for arbitrary constraints C, using higher- 
order edge switching processes (Sec. [5]). We eventually present several practical and empirical 
illustrations in Sec. [31 



1. Edge swaps as a Markovian reshuffling process 

Miklos et al. [l5l | showed that it is possible to use a pairwise edge switching reshuffling algorithm 
to generate a uniformly random sample of oriented graphs whose degree distributions are fixed. 
later called this method "switching and holding" (S&H). More precisely, this edge switching 
method comes to randomly choosing two links in the current graph, checking whether swapping 
these links leads to a graph respecting the constraint and, if yes, carry out the corresponding swap, 
otherwise, "hold" the current graph and reiterate the procedure. Note that, as such, S&H difi^ers 
from a simple switching method in that it focuses on the number of swap trials rather than the 
number of swaps. 

This procedure may be described as a walk in a Markov graph. The Markov graph is a directed 
graph, allowing self- loops and multiple edges such that its set of nodes is exactly Qc- Arcs of 
the Markov graph are such that, (i) whenever a valid pairwise edge switch transforms Gi G Gc 
into Gj G ^/c, we draw an arc from Gi to Gj (and vice versa, mechanically), and (ii) whenever a 
pairwise edge switch transforms Gi G Gc into a graph which does not belong to Gc, we draw a 
self- loop from Gi to Gi. In this context, the reshuffling procedure is a random walk in the Markov 
graph, that is, a Markov chain pl| converging to an equilibrium distribution whose probabilities 
can be obtained from the transition matrix of the Markovian process. If the Markov graph has 
constant degrees (i.e. the in-degree and out-degree of all graphs of the Markov graph are all the 
same), the reshuffling process is uniform. If the Markov graph is connected, all possible graphs 
are reachable. If it is both connected and has constant degrees, the process leads to uniformly 
random elements of Gc- See an illustration on Fig.[T] 

Edg e switching methods have been used to generate random graph samples in various instances 
lol 131 22 , 0, [3, 0] and have been studied and improved in various directions 2^, 16 , 13, 0, HH] . 



To use such a switching method, one has nonetheless to ensure that all graphs of Gc are present 
in the equilibrium distribution of the random walk with an identical probability, i.e. ensure that: 

(i) all graphs of Qc are uniformly drawable, and 

(ii) all graphs of Qc arc exhaustively reachable. 
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Figure 1: Simple Markov graph for a constraint on a graph of (i) three nodes with (ii) given in- and out-degree 
distributions and (iii) without multiple edges but possibly self-loops. Non-valid swaps are represented by self-loops 
in this Markov graph, which has thus a constant degree. 



Uniformity is g uaranteed by the S&H approach within a given connected portion of the Markov 
graph. While |15| show uniformity in the case of degree distribution constraints, the proof they 
mention in Appendix A of the same reference can easily be extended to any kind of constraint. A 
sketch of this proof is given by the following reasoning: "holding" on failed trials is equivalent to 
connecting a Markov graph node to itself as many times as there are failure possibilities. Thus, 
the in- and out-degree of all Markov graph nodes will be equal to the number of trials (both failed 
and successful ones) , which is strictly the same for every graph of Qc , since it only depends on the 
constant number of links of graphs of Qc ■ Finally, for a random walk in a Markov graph where all 
nodes have the same in and out-degree, the probability of being on a given node is asymptotically 
uniform. 

Exhaustivity relates to the issue of whether the whole Markov graph is connected, i.e. the 
existence of a path going from any node to any other node of the Markov graph. In Markov 
chain terminology, the chain is said to be irreducible. To our knowledge, existing theorems on 
exhaustivity concern simple constraints C, essentially reduced to little more than the conservation 
of the original degree sequence: i.e. in the case of trees [5], graphs Q, connected graphs (23| and 



bi-connected graphs [24 1 



However in the general case of more elaborate constraints (e.g. 14, Q), using the S&H method 
appears to be less legitimate, since no such exhaustivity theorems are known. For instance, Rao et 
al. [l^ show that extending C by requiring the graph to have both directed edges and no self-loop 
makes it impossible, in some cases, to reach all graphs of Qc by pairwise edge swaps. In particular, 
no pairwise edge switch could indeed transform one of the following adjacency matrices into the 
other one (forbidden self-loops are marked with a star): 




Convergence of the walk. In addition to these issues, convergence s pee d remains an open theo- 
retical question 0, [12, often coped with using practical heuristics [id. [25|. As said before, the 
walk usually aims at randomly drawing an element of Qc in order to check whether graphs of Qc 
exhibit some properties on the target observables (and, implicitly, in order to check whether C 
could constitute a sufficient explanation for these observables). In other words, some measure- 
ments are carried out on graphs of Qc so that the walk is generally considered to have performed 
a "sufficient" number of steps when those measurements on the target observables apparently 
plateau. 
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side A side B projected graph 




Figure 2: On the left, one possible realization of a graph drawn from Qcq'- note that B-sided nodes of the bipartite 
graph (marked by squares) have out-degree zero and A-sided nodes (marked by circles) have in-degree zero. On 
the right, the corresponding projection of this bipartite graph onto side A. 

2. Higher-order switching process 

In this section, for the sake of clarity we focus on directed graphs, although it is effortless to 
formulate the whole argument for undirected graphs. 



2.1. k-edge switching 

In general, the disconnectedness of the Markov graph stems from the impossibility of trans- 
forming a graph into another graph by a simple pairwise switching. To overcome this issue, we 
propose an experimental method based on higher-order edge switchings: given G G Gc, let us 
consider k edges (a^, corresponding to nodes (ai, a^, &i, &fc), possibly not all dis- 

tinct. The fc-edge switching process, henceforth called "k-switch", comes to randomly choosing 
one permutation a among the k\ possible permutations of bk). The resulting graph is such 

that edges (fli, &i)ie{i,...,fc} are replaced with {ai,a{bi))i^^i k} (for an example of pseudocode, 
see Alg. ID). 

It is immediate to see that neighbors of G in the Markov graph corresponding to a classical 
pairwise edge swap are also neighbors of G in the Markov graph corresponding to a fc-switch, when 
considering a permutation that just swaps two hi, hi'. Similarly, when fc = 2, we fall back on the 
S&H approach. 

For increasing values of fc, this procedure creates new links in the Markov graph and new 
neighbors appear (in the case of Fig. [T] it is easy to see that the Markov graph is complete for 
fc = 3). More importantly, some potentially disconnected components of the Markov graph may 
thus become connected. 



Illustration. To illustrate this higher-order switching process, let us consider the case of bipartite 
(or 2-mode) graphs. Such graphs are useful in the context of real-world networks, for example to 



study collaborations in social groups [17[ or peer-to-peer exchange systems Nodes belong to 
one of two sides A and B, and links connect pairs of nodes from distinct sides only. It is possible 
to build monopartite (or 1-mode) graphs from the bipartite one by keeping only A (resp. B) nodes 
and linking them if they are connected to the same B (resp. A) node in the original bipartite 
structure, as pictured on Figure [2] These graphs are called projections of the original bipartite 
graph on side A (resp. B). 

Consider a case consisting of a constraint Co = Cq U Cj, on bipartite graphs such that: 

(i) Cq .■ the bipartite graph contains no multiple link, it consists of two sides mth fixed degree 
distributions: 

• "side A": 5 nodes, out-degree {2,2,2,1,1} (and in-degree 0); 
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Figure 3: Markov graph of Gcg various fc-switching procedures: dashed blue arrows correspond to k = 2, plain 
green arrows to fe = 4. For readability purposes, we simplified the representation by discarding self-loops and 
multiple edges of the Markov graph. 



• "stde B": 4 nodes, in-degree {3,2,2,1} (and out-degree 0). 
(ii) CqI the degree distribution of the projected graph on side A is fixed: {2, 2, 2, 1, 1}. 

Put shortly, this constraint consists in simuhaneously imposing degree distributions on a bi- 
partite graph and on one of its monopartite projections. An example of such a graph is represented 
Fig. [2] Given such a Co, Markov graphs corresponding to Qc^ contain 7 nodes. The Markov graph 
for fc > 4 is connected, while it actually consists of 3 disconnected components for k G {2, 3} — 
see Fig. H 

We chose this practical case in part because the Markov graph is still small enough to be 
visualized for each value of k. In the remaining examples, it will not be possible anymore, and no 
theoretical proof is available; we therefore rely on experimental investigations. 

2.2. Relationship between k and exhaustivity 

There is an upper bound on k such that the Markov graph is assuredly connected and the 
underlying walk is exhaustive/irreducible. In particular, given two graphs Gi and G2 of Qc, there 
always exists a permutation of size at most M (the number of edges) such that Gi is transformed 
into G2. 

Proof. The M edges of Gi can be written as {(ai, 61); (02, ^2); (oAf, ^m)}- Similarly, in G2, 
because both M and degree sequence are fixed, we can write that M edges originate from 
the same family (ai)ig{i,...,fc} to another family fc}, i.e. these edges can be written as 

{(ai, &']^); (02, 62); (om, &m)}- Because the degree sequence is fixed, families of nodes b and 
b' contain exactly the same nodes repeated the same number of times. Thus, a defined as 
62, ^m) — ^ (^i) ^2j ^m) then a valid M-switch permutation which does transform 
Gi into G2. □ 

The number of connected components of the Markov graph is thus a monotonously decreasing 
function of k converging at most for fc = M. As increasing k guarantees a better coverage of 
the Markov graph, the relevance of this method lies essentially in improving the confidence in 
the random mixing achieved by rewiring procedures — rather than addressing convergence speed 
issues Q 



In practice, increasing k comes however at the price of an increasingly slow convergence of the walk, in terms 
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2.3. Data storage format 

One of the first requirements for the data format is to enable quick random selection of edges 
and subsequent edge switches, i.e. update of the graph. A straightforward option for drawing 
random links consists in using an array of edges, and picking a random integer lower or equal to 
the array size. To store the graph, by contrast, we opt for an adjacency list, especially because 
the operation of constraint checking often requires to access neighbors of a given node (which is 
possible in 0{5), where 5 is the node degree). Eventually, we thus maintain and update two data 
structures: an adjacency list and an array. These two data strutures have a comparable size and 
are respectively most efficient for link selection and graph operations. 

2.4. Complexity 

Carrying out a /c-switch in G £ Qc consists in: 

1. Finding k random edges in G represented as an adjacency list, in 0{k)\ 

2. /c-switching their extremities into a resulting graph G", in 0{k); 

3. Verifying that G" respects the constraint set, i.e. G" £ Qc, in 0{f gc) related to a given 
design of the constraint check. 

C should be such that there exists a tractable check on any graph of Qc II In best cases when it 
is possible to check incrementally if G" G Qc relatively to the k switched edges only, fg^ at best 
belongs to 0{k). The complexity of doing n trials of /c-switches is thus at least 0{nk). 
Additionally, target observables have to be computed at regular intervals to monitor their asymp- 
totical convergence. Those target observables shall also be chosen to be tractable. If, moreover, 
the observation frequency is chosen to be sufficiently low, constraint checking shall dominate the 
overall running time. 

Algorithm 1: Pseudocode of the fc-switching procedure in the case of a directed network 
with constraints: degree distributions, no self-loops, no multiple arcs and a set of constraints 
C+ (associated to the set Qc+), which depends on the context. 

input : Graph Go = (Vb,i?o), stored as an array of adjacency lists; number of switching 
trials: n ; size of the switches: fc; 

output: graph G produced by n attempts of switching; 

G = iV,E) -s- Go ; // initialization 

for j ^ 1 to n do 

draw randomly k different arcs : {(a^ , bi)}^^j £ E ; 
draw randomly cr a permutation of the index set / ; 
build the set of swapped arcs {{ai , ^cr(i))}jgj ; 
E'^Eu{{a,, &.(,))} \ {(a, , b,)} ; 
define G' = (V, E') ; 

define Vi e I , W^ = {b : 3 {oi , b) e E} \ {6J ; 

if Vi , ai ^ // test no self-loops 

and V« , 5o.(i) ^ Wi II test no multiple arcs 

and G' G Qc+ II test constraint C"*" 

then G ^ G' ; 
end 

The reason why large values of k are not necessarily advisable actually lies in the possibility of 
/c-switch failures, i.e. such that the resulting graph does not anymore belong to Qc and thus the 
walk stays on the same graph at the next step. Odds of such failure depend in a complicated way 
on k: on one hand, when increasing k we are allowing new types of switches, therefore allowing 



of switch trials, as detailed in the following subsection on complexity. 

^Various optimizations of this very step are open to a discussion which depends on the chosen external set of 
constraints C, but are obviously outside the scope of the present paper. In particular, we assume that fg^-. is not 
e.g. exponential in N or M. 
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access to possibly more graphs from a given graph of Qc- On the other hand, many of these new 
possible fc-switches are also likely to fail (i.e. fall on a graph which does not belong to Gc), because 
they alter more deeply the graph (i.e. more deeply than fc'-switches for k' < k). In the end, the 
proportion of fc-switch successes generally depends on fc in a non-monotonous manner. 

In practice, given an a priori fixed number of trials, we observe that the number of successful 
alterations tends to decrease sharply for large values of k (as shown below e.g. in Tab.H]). In other 
words, high-order alterations apparently make the walk stay longer on a given graph, although at 
the same time successful alterations reshuffle more strongly the graph. Put shortly, with increasing 
fc, the walk is more likely to stagnate, but when it does not, it is more likely to lead to more different 
graphs. 

2.5. Random graph sampling using k-switches 

It is therefore hard to assess whether the mixing achieved by a fc-switch-based walk of given 
length is more efficient or not for higher values of fc. However, the number of connected components 
of the Markov graph is monotonously decreasing with fc: increasingly connected portions of Gc 
are visited with increasing values of fc. Because of that, it is relevant to propose an asymptotical 
approach on fc. 

More precisely, a fc-switch walk is stopped when some measures on Gc apparently plateau to 
some values. Starting with the traditional case fc = 2, we thus progressively increase fc up to a 
"sufficient" value, i.e. such that the measurements appear to plateau from some fco; as is classical 
in asymptotical convergence of simulation-based methods. As we will see in the following section, 
it seems empirically that even very small values of fc are often satisfactory. 

3. Illustrations on practical cases 

In addition to the earlier toy example Cq shown on Fig. [3] on an extremely small graph, we now 
illustrate this asymptotical approach on four practical cases for various kinds of constraints. For 
the sake of clarity, we gathered in Appendix 13.41 the descriptions of constraint checking algorithms 
and their respective complexity. Note that, here, we only consider constraints on graphs without 
multiple edges; the higher-order switching approach may nonetheless be used in the context of 
multigraphs. 

3.1. Constraint based on oriented and colored triangles 

We first suggest a quite fictitious constraint Ci such that: 

(i) Cf ; the graph is directed and made of N nodes, each one having one outgoing and one 
incoming arc; 

(ii) C+: 

• nodes are equally divided into 3 groups of N/3 nodes, each denoted with a color: red 
(R), green (G), or blue (B); 

• the graph is made of N/3 isolated and oriented cycles of 3 nodes (i.e. N isolated 
triangles such that each node points to a single other node of the triangle). 

Graphs of Gci are thus such that each node exactly has an in-degree of 1 and an out-degree 
of 1. Suppose we want to randomly draw an element of Gci using fc-switches, starting with an 
initial graph Gq such that each triangle is "R-G-B-oriented" , i.e. a red node points to a green one 
which points to a blue one which points to the red one. 

For fc = 2, the only possible fc-switch is identity, so that in the Markov graph it is not possible 
to leave Gq. For fc = 3, possible fc-switches reshuffle links within a given triangle, as illustrated 
on Fig. m the associated walk can only lead to "R-G-B-oriented" and "R-B-G-oriented" triangles. 
For k — A, link exchanges are possible between triangles, so that eventually all combinations of 
colored triangles are possible (including non trichromatic triangles "R-R-R", "R-G-G", etc.)|l 



^The corresponding Markov graph is thus connected for fc = 4, which hence happens much before k = M. 
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6 8 10 12 14 16 18 20 
Trials (in millions) 

Figure 4: Left: Illustration of the increasing possibilities of fc-switches for k £ {2, 3, 4} in the case of "R-B-G" 
triangles. Right: Number of "R-B-G" triangles with respect to the number of fc-switch trials, for k £ {2, 3, 4} 
(averages and corresponding confidence intervals computed over 10 000 simulations for each k). 



Table 1: Proportion of triangles of each type with respect to k, averaged over 10000 completed simulations 
consisting of 10* trials, including the respective mean number of effectively successful fc-switches. The last column 
features the theoretical average value over all graphs of Sci ■ 





k = 2 


A; = 3 


k = A 


fc = 5 


fc = 6 


Theoretical {Gc-i ) 




0. 


0. 


0.036 


0.036 


0.036 


0.036 


G-G-G 


0. 


0. 


0.036 


0.036 


0.036 


0.036 


B-B-B 


0. 


0. 


0.036 


0.036 


0.036 


0.036 


R-G-G 


0. 


0. 


0.111 


0.111 


0.111 


0.111 


R-B-B 


0. 


0. 


0.111 


0.111 


0.111 


0.111 


G-G-B 


0. 


0. 


0.111 


0.111 


0.111 


0.111 


G-B-B 


0. 


0. 


0.111 


0.111 


0.111 


0.111 


R-R-B 


0. 


0. 


0.111 


0.111 


0.111 


0.111 


R-R-G 


0. 


0. 


0.111 


0.111 


0.111 


0.111 


R-B-G 


0. 


0.500 


0.113 


0.113 


0,113 


0.113 


R-G-B 


1.000 


0.500 


0.113 


0.113 


0,113 


0.113 


Successes 





997 ± 74 


2643 ± 108 


2067 ± 132 


936 ± 55 





Considering a trivial target observable which is the proportion of triangles of a given color- 
orientation, we now compare the performance of /c-switch-based walks for k € {2, 3, 4, 5, 6}. Using 
simulations on graphs of = 180 nodes, we consider the plateauing values of each walk, as shown 
on Fig. m We then gather in Tab. [1] the various averages of such values obtained over 10000 
simulations for each k. We see that average values plateau for fc = 4 which generally fits well 
the theoretical values, which can be analytically computed for Ci (see also Tab. [T]). However, 
values obtained for fc = 2 (classical S&H) and fc = 3 are significantly different from the theoretical 
values, indicating that the corresponding Markov processes are unable to reach every graph of the 
set Gcx- In particular, the classical S&H method cannot be used in the case of Ci to generate a 
random sample, whereas the multiple edges switching method with fc > 4 is reliable. 

Such apparently arbitrary constraints can actually be relevant when considering e.g. complex 
molecular edifices modeled as graphs linking molecules according to chemical constraints |18l] . 

3.2. Constraint based on correlations of degrees 
We now consider constraint C2 imposing that: 

• C®.' the graph is directed, without self-loops nor multiple edges and has a fixed degree 
sequence, 

• C2 : the distribution of out- degree correlations between pairs of connected nodes is fixed. 
In other words, the number of links connecting nodes of some out-degree to nodes of 
some (possibly distinct) out-degree remains the same across the set of graphs. 

The practical interest of this constraint becomes explicit in the empirical case of a hyperlink 
citation network. In qualitative terms, this constraint should in effect help in appraising how much 
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correlations in citing activities (in terms of out-degrees) explain the existence of cyclic citation 
patterns (in terms of directed triangles). To this end, we start with an initial graph Gq extracted 
from the 50,000 first web pages from the network database used in we denote this database 
WWW. We carry out one billion trials in each walk corresponding to fc-switches for fee [2, 6]. We 
measure the average number of directed triangles (i.e. oriented cycles of length 3) of graphs of 
Gc2 thereby estimating how much C2 contributes to this kind of topological patterns. Results are 
gathered on Tab. [2] and Fig. [5l 
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Figure 5: Number of directed triangles with respect to the number of fc-switch trials {k S [2, 6]). 



Table 2: Number of directed triangles with respect to k, averaged over 50 completed walks consisting of 1 billion 
trials, and respective number of effectively successful fc-switches. Standard deviation are generally negligible and 
never exceed 5% of the observed mean. 



Target 
observables 


Starter Go 


fc = 2 


fc = 3 


fc = 4 


fc = 5 


fc = 6 


A 


50.77 10^ 


1.92103 


i.gi-io^ 


1.91103 


1.92-103 


1.91103 


A 


31.7010* 


2.9010* 


2.88-10* 


2.8910* 


2.90-10* 


2.8810* 




15,423 


59 


56 


58 


58 


59 


Successes 




6.9610'' 


8.22-10' 


5.2810'' 


2.50-10' 


1.0010'' 



In spite of their diverse convergence speeds and success rates, Vfc G {2, 3, 4, 5, 6} walks converge 
to a same average number of such directed triangles. As is usually the case with random graphs 
with constraints, and contrarily to the previous example, we are trying to empirically estimate 
the theoretical average of this measure on ■ We therefore assume that the plateauing of limit 
measures for increasing fc is a sufficient indication that this empirical estimate can be trusted, 
which is classical with simulation-based convergence — similarly, the user may also decide to ex- 
tend simulations to higher values of fc. These results suggest that the reshuffling process is likely 
to cover well even for fc = 2, i.e. traditional edge swaps. As such, the fc-switch approach pro- 
vides an increasing confidence in the simulation estimate of this measure. Qualitatively, because 
average observable values for do not match those of Go, we have additional confidence in 
the interpretation that correlations in citation activities does not suffice to explain cyclic citation 
patterns. 

To get some insights on how the convergence process varies with input size, we implement the 
algorithm on smaller samples of this dataset: the first 20,000 and 10,000 pages. Corresponding 



* Available from |http: / / www.barabasilab.com/rs-netdb.phpl 
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results are gathered on Table [21 providing information about computational requirements in the 
various casesH. As will also be the case in the next examples, it seems to be difficult to find 
any obvious relationship between input size and the number of trials necessary to observe the 
convergence. 

Table 3: Experimental values obtained for constraint C2 on different inputs (with N: number of nodes, M: 
number of arcs): minimum k measured to obtain a uniformly random sample, approximate amount of trials needed 
for convergence, maximum memory space used during the process. 



Input 


N 


M 


k threshold 


approximate number of trials 


memory used 


WWW-50K 


50,000 


143,592 


2 


~ 1000m 


13 MB 


WWW-20K 


20,000 


63,224 


2 


~ 250m 


8 MB 


WWW- 1 OK 


10,000 


36,970 


2 


~ 250m 


5 MB 



3.3. Constraint based on triangles 

As said above, it is straightforward to apply the method with constraints on undirected graphs. 
C3, and C4 below, are of this kind. 
C3 = U C;^ is such that: 

• C3; the graph is undirected, with a fixed degree distribution, has no multiple edges nor 
self-loops 

• Cj; the number of (undirected) triangles remains the same. 

The interest of C3 can be illustrated in the case of a collaboration network. The amount of 
distinct motifs of size four will be our target observables. In that case, C3 practically aims at 
checking whether the size and shape of the close neighborhood of a scientist in this field is related 
to the cohesiveness between agents — that is, more precisely, to check how the tendency to do 
triangular interactions influences the number and connectedness of neighbors at distance 1 and 2. 

Go is an undirected graph of collaborations between scientists extracted from the Anthropo- 
logical Index Online database!^ The dataset we use focuses on a specific subfield consisting of 
Scandinavian archeology-related papers published over the period 2000-2009: nodes are paper 
authors, links feature collaborations between authors in these papers. Go contains 273 individuals 
and 280 links. 

Results of the corresponding exploration of the random graph space defined by C3 are gathered 
on Fig. |5] and Tab. 0] for motifs of size four, for which there is significant variation from Gq for 
k > 2. More importantly, these diverging results do not appear when using fc = 2, but only appear 
from k > 2, being then similar for all k G {3,4,5,6}. Thus, the usual S&H method — unlike 
the generalized switching method with fc > 3 — cannot be used to generate a uniformly random 
subset of Qcs on this particular dataset: the obtained sample would be significantly biased. In 
other words, only by going beyond k — 2 makes it possible to show that C3 is not sufiicient to 
explain the particular shape of the neighborhood of these agents in this empirical network. 

On Table \5\ we gather results on the convergence process on larger collaboration databases 
extracted from the AIO database in other geographical area, namely the British Isles and the 
whole of Europe, over the same period of time. Qualitative results on the relationship between 
C3 and target observables hold, yet there is, again, no obvious relationship between convergence 
and input size & type. 

3.4. Constraint based on connected components 

Finally, C4 addresses the issue of connected components. C4 is such that: 



^Computations have been made using a standard computer (2x2.33GHz processor, 2GB memory). 
^Available from http://aio.anthropology.org.uk/aiosearch 
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Figure 6: Cumulative mean number of 4- nodes cycles for C3. 
Table 4: Mean number of motifs of size four after 20 simulations of 10 billion trials on Go from the AIO database. 



Target observables 


Starter Go 


k = 2 


k = 3 


k = 4 


k = 5 


k = 6 


c 


2794 


2799 ± 4 


2907 ± 53 


2933 ± 32 


2942 ± 64 


2894 ± 42 





406 


410 ± 3 


483 ± 6 


483 ± 5 


481 ± 5 


482 ± 6 




730 


734 ± 3 


843 ± 9 


841 ± 10 


841 ± 6 


840 ± 8 




108 


108 ± 


120 ± 2 


120 ± 2 


120 ± 2 


119 ± 2 


Successes (in millions) 




79 


166 


96 


34 


8 



Table 5: Experimental values obtained for constraint C3 on different inputs. 



Input 


N 


M 


k threshold 


approximate number of trials 


memory used 


Scandinavia 


273 


280 


3 


~ 20, 000m 


2 MB 


British Isles 


807 


1020 


2 


~ 10, 000m 


2 MB 


Europe 


12112 


9090 


2 


~ 100, 000m 


3 MB 



• C4 ; the graph is undirected, with a fixed degree distribution, has no multiple edges nor 
self-loops 

• C J ; distribution of the sizes of connected components remains the same 

Go is an undirected graph built from human metaboHc pathways hsted in the Biocyc databas^: 
each node is a protein, and each hnk connects any two proteins involved in the same biochemical 
pathway. It features 679 nodes and 11030 links. C4 aims at checking whether the existence of 
islands of pathways, as represented by connected components, is correlated with the presence of 
particular local, short-range interactions patterns between specific proteins. 

Simulation results are featured on Tab. [51 averages of statistical variables obtained over cor- 
responding explorations of Qd do not match those of Gq. This suggests that C4 is not a possible 
explanation for the presence of 3- and 4-sized local patterns in this metabolic pathway network. 

In this case, going beyond k — 2 did not yield any particular improvement on the random 
mixing process results, yet provided a stronger confidence on the random exploration of Gc^- 

Again, we run the algorithm on other network datasets: biochemical pathways of Aquifex 
aeolicus (denoted aaeo) and Burkholderia pseudomallei (bpse), see Table [71 Qualitative results 
hold too, while there is still no obvious relationship between convergence features and input size 
& type. 



^http://www. biocyc.org 
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Figure 7: Number of undirected 4-nodes paths with respect to the number of fc-switch trials (k £ [2, 7]) for C4. 



Table 6: Mean number of patterns of size 3 and 4 on 50 simulations involving 200 000 trials on Go for 'Pathways'. 



Target ohservables 


Starter Go 


fc = 2 


fc = 3 


fc = 4 


fc = 5 


fc = 6 


fc = 7 


A 


161.3 -lO^ 


51.7 -103 


51.7 lO^ 


51.7 -10=^ 


51.7 -10=^ 


51.7 10^ 


51.7 -103 




2070 -lO^ 


178 -lO^ 


178 -10^ 


178 -lO-^ 


178 -lO-^ 


178 -103 


177 -103 




34.5 -lO"^ 


31.1 -lO" 


31.1 ■10'^ 


31.1 -10" 


31.1 -10" 


31.1 lO^^ 


31.1 -10" 


Successes 




42,300 


60,400 


52,800 


38,100 


25,500 


20,400 



Table 7; Experimental values obtained for constraint C4 on different inputs. 



Input 


N 


M 


fc threshold 


approximate number of trials 


memory used 


aaeo 


264 


1,193 


2 


~ 20, 000 


2 MB 


Human 


679 


11,030 


2 


~ 200, 000 


3 MB 


bpse 


1,447 


20,620 


2 


~ 500, 000 


6 MB 



Conclusion 

Pairwise edge swapping metliods, sucli as S&H, are relevant to generate uniformly random 
samples of graphs in some simple cases, such as degree distributions. As constraints get stronger 
than just degree distributions, pairwise edge swaps may not be appropriate since the corresponding 
Markov graph is likely to be disconnected. We therefore proposed a higher-order switching method, 
denoted "fc-edge switching" , which makes it possible to tackle this issue by improving progressively 
the connectedness of the Markov graph of the corresponding walk. 

While this approach guarantees that it is theoretically possible to navigate uniformly through- 
out the whole Markov graph for some value of fc, for high values of k the process is likely to 
be empirically less and less practicable. As such, this approach nonetheless constitutes an easily 
implementable method to incrementally explore larger portions of the Markov graph; thereby ob- 
taining an increasing, asymptotically certain confidence on the representativeness of the obtained 
sample. In particular, this method potentially generates random graphs for any type of constraint 
preserving degree distributions. It also makes it possible to incrementally check the robustness 
of results obtained using traditional edge swaps with k = 2 (which have no reason to yield valid 
results as such), thereby improving the confidence on the Markov graph exploration achieved by 
2-switches. 

Put simply, when average measurements on the reshuffled graphs tend to plateau for some 
successive values of fc, we suggest that it is empirically sensible to assume that the walk covers a 
reasonably representative portion of the graph set Qc — as such constituting a useful extension 



k=2 
k=3 
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of edge swapping random graph generation approaches. In this respect, an interesting perspective 
for the present work would be to find classes of constraints C for which some low values of k 
guarantee the connectedness of the fc-switch Markov graph. 
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APPENDIX: Constraint checking algorithms and complexities 

In this Appendix, we describe briefly some possible algorithms for implementing tests corre- 
sponding to the above-described constraints. 

Constraints Ci and C3 

Constraint Ci may be implemented by testing whether a switch trial creates as many triangles 
as it destroys. For each arc (a^, &») involved in a switch trial, we may list which oriented triangles 
are being created and destroyed by looking for the out-neighbors of bi which are also in-neighbors 
of Qi before and after the switch trial. The same goes for C3, except for the fact that triangles 
are not oriented. 

A random link has a probability proportional to S to be connected to a node of degree S, and 

we have to go through the list of neighbors for each neighbor of bi. The same goes with a^, so that 
the comparison of both lists of neighbors has eventually an average complexity in 0{6 ), where S 

—4 

is the mean degree. This yields an overall complexity in 0{nk5 ), where n is the number of trials. 
Note that 6 is always equal to 1 in the case of Ci. 

Constraint C2 

The test corresponding to this specific constraint can be implemented as follows: after storing 
at the beginning of the process the out-degree of each node, the user checks at each trial that 
for any couple of degrees {61,62), links whose extremities have degrees di and 62 are created and 
destroyed in equal numbers. This specific test can be done in constant time, yielding an overall 
time complexity of the algorithm in 0{nk). 

Constraint C4 

A very simple (yet not optimal) way to implement this constraint test is to check, for each 
link involved in a switch, the size of the connected component it belongs to before and after the 
switch. This can be done in 0{M) by using a breadth first search algorithm. This induces a global 
complexity in 0{nkM). 
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